A A Hierarchical Thread Scheduler and Register File for Energy-efficient Throughput Processors
نویسندگان
چکیده
Modern graphics processing units (GPUs) employ a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complex thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing energy on massively-threaded processors such as GPUs. First, we investigate a two-level thread scheduler that maintains a small set of active threads to hide ALU and local memory access latency and a larger set of pending threads to hide main memory latency. Reducing the number of threads that the scheduler must consider each cycle improves the scheduler’s energy efficiency. Second, we propose replacing the monolithic register file found on modern designs with a hierarchical register file. We explore various tradeoffs for the hierarchy including the number of levels in the hierarchy and the number of entries at each level. We consider both a hardware-managed caching scheme and a softwaremanaged scheme, where the compiler is responsible for orchestrating all data movement within the register file hierarchy. Combined with a hierarchical register file, our two-level thread scheduler provides a further reduction in energy by only allocating entries in the upper levels of the register file hierarchy for active threads. Averaging across a variety of real world graphics and compute workloads, the active thread count can be reduced by a factor of 4 with minimal impact on performance and our most efficient three-level software-managed register file hierarchy reduces register file energy by 54%.
منابع مشابه
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors
This paper proposes and evaluates software techniques that increase register file utilization for simultaneous multithreading (SMT) processors. SMT processors require large register files to hold multiple thread contexts that can issue instructions, out of order, every cycle. By supporting better inter-thread sharing and management of physical registers, an SMT processor can reduce the number o...
متن کاملThesis - Vasileios Porpodas
Very Long Instruction Word (VLIW) processors are wide-issue statically scheduled processors. Instruction scheduling for these processors is performed by the compiler and is therefore a critical factor for its operation. Some VLIWs are clustered, a design that improves scalability to higher issue widths while improving energy efficiency and frequency. Their design is based on physically partitio...
متن کاملAn On-Chip Multiprocessor Architecture with a Non-Blocking Synchronization Mechanism
tive to superscalar architectures [5][8][12][13]. Strengths of an on-chip MP architecture are threefold. First, an MP can exploit different level parallelism, thread-level parallelism (TLP), in addition to ILP. Second, the complexity can be suppressed using simple processors. This ensures a high clock rate. Third, communication latency can be significantly reduced using an on-chip network. Thes...
متن کاملThe Named-State Register File
A register file is a critical resource of modern processors. Most hardware and software mechanisms to manage registers across procedure calls do not efficiently support multithreaded programs. To switch between parallel threads, a conventional processor must spill and reload thread contexts from registers to memory. If context switches are frequent and unpredictable, a large fraction of executi...
متن کاملEnergy Efficient Application Specific Banked Register Files
Register files account for a significant fraction of the power dissipation in modern RISC processors. Register file banking is an effective alternative to monolithic register files in embedded systems. We propose a profile-based technique to arrive at a customized energy-efficient bank configuration for a given application on a dual bank register file. The technique consists of a register renam...
متن کامل